Back

Virus Evolution

Oxford University Press (OUP)

Preprints posted in the last 90 days, ranked by how well they match Virus Evolution's content profile, based on 140 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.

1
Deep mutational scanning of recent SARS-CoV-2 variants highlights changing amino acid preferences within epistatic hotspot residues

Taylor, A.; Starr, T. N.

2026-03-13 microbiology 10.64898/2026.03.11.711006 medRxiv
Top 0.1%
22.6%
Show abstract

Deep mutational scans across receptor-binding domains (RBDs) of diverging SARS-CoV-2 variants reveal ongoing changes to the effects of mutations, a phenomenon known as epistasis. Careful accounting for these altered mutational effects is important in viral surveillance and forecasting, and more broadly, for understanding the impacts of epistasis on real-world viral evolutionary trajectories. Using a yeast-display RBD deep mutational scanning (DMS) platform, we measure the impacts of virtually all single amino acid mutations and single-residue deletions in the Omicron KP.3.1.1 and LP.8.1 RBDs on folded RBD expression and binding affinity for the human ACE2 receptor. Our comprehensive maps reveal patterns of evolutionary accessibility and constraint at single-residue resolution and when compared to prior datasets, highlight sites whose amino acid preferences continue to change across viral variants. Notably, sites 455, 456, and 493 - which have exhibited repeated substitutions and epistatic dependencies across Omicron subvariants going back to BA.1 - again demonstrate altered patterns of mutational accessibility and constraint. Therefore, it appears that these hotspots of repeated RBD evolution have not yet converged on fixed amino acid solutions, but instead remain sites of ongoing epistatic reconfiguration. We compare our measurements of direct RBD:ACE2 affinity with recently published measurements of mutation impacts on ACE2 binding in the full quaternary spike context, which also integrates the effects of spike conformational dynamics; our analysis uncovers mutations like H505W that could favor adoption of the down/closed RBD conformation as a viral strategy for future antigenic evolution.

2
A Cophylogenetic Approach for Virus-Host Interaction Prediction

Chowdhury, M. Z. U. S.; Murali, T. M.; Sashittal, P.

2026-02-27 evolutionary biology 10.64898/2026.02.26.708038 medRxiv
Top 0.1%
22.3%
Show abstract

Advances in metagenomics have rapidly expanded viral discovery, revealing vast diversity across Earths virosphere. Yet most virus-host interactions--i.e., which viruses infect which hosts--remain unrecorded. Identifying these interactions is essential for anticipating zoonotic spillover events and advancing biomedical applications such as bacteriophage therapy. However, the sheer diversity of viruses and hosts makes comprehensive experimental mapping infeasible, motivating the need for computational approaches. Most existing prediction methods rely on supervised learning strategies that use sequence-derived features, such as codon usage bias or k-mer frequencies, and do not model the coevolutionary processes that shape virus-host interactions. This limits their ability to generalize and the evolutionary interpretability of their predictions. We introduce CoEvoLink, a framework for predicting virus-host interactions that integrates sequence-based evidence with phylogenetic signal by explicitly modeling the coevolutionary histories of viruses and hosts. CoEvoLink infers likely but unobserved interactions by minimizing the number of evolutionary events required to explain them, yielding the most parsimonious interaction under a coevolutionary model. This formulation generalizes classical maximum parsimony, typically defined on a single phylogeny, by jointly optimizing parsimony across both virus and host phylogenies. Sequence-based information is incorporated by assigning a cost to each potential interaction that reflects its likelihood based on genomic features. By drawing a connection between computing parsimony on interaction matrices and maximum parsimony on phylogenetic networks, we derive a polynomial-time algorithm that balances parsimony with sequence-derived prediction cost. We demonstrate the effectiveness of CoEvoLink on simulated data under diverse coevolutionary models. Applying CoEvoLink, we identified putative bat hosts of betacoronaviruses that have not yet been cataloged in the VIRION database. On a benchmark derived from metagenomic sequencing data, we demonstrate that CoEvoLink improves the performance of existing phage-host prediction tools using cophylogenetic information. Code availabilityhttps://github.com/sashittal-group/CoEvoLink Note: This paper is accepted at RECOMB 2026 (30th Annual International Conference on Research in Computational Molecular Biology).

3
Inferring the multi-host fitness landscape of endive necrotic mosaic virus from cross-inoculation experiments

Roques, L.; Papaix, J.; Martin, G.; Forien, R.; Lenormand, T.; Soubeyrand, S.; Berthier, K.; Moury, B.

2026-03-23 evolutionary biology 10.64898/2026.03.18.712764 medRxiv
Top 0.1%
22.2%
Show abstract

Fitness landscapes offer a compact representation of adaptation, yet are rarely inferred from multi-environment data. We present a Bayesian approach to infer a multi-host phenotypic fitness landscape from cross-inoculation assays by linking successful infection probabilities to Fishers geometrical model under strong selection and weak mutation. The model estimates (i) the distance matrix among host-specific phenotypic optima, (ii) host-specific permissiveness through the widths of fitness peaks on target hosts, and (iii) host-specific differences in the efficiency with which phenotypic suitability translates into successful infection. We apply the approach to an experimental evolution dataset for endive necrotic mosaic virus evolved on five Asteraceae hosts and challenged in a full cross-inoculation design. The inferred landscape can be visualized as a phenotypic map of the host community, revealing pronounced heterogeneity in host permissiveness and a geometry broadly concordant with host phylogeny. By grounding assay-derived distances in an explicit mechanistic model, the approach provides a parsimonious representation of multi-host constraints that can be used to discuss establishment barriers and potential springboard hosts in heterogeneous communities. More broadly, it offers a general method for inferring effective fitness landscapes from sparse multi-environment data.

4
Mapping the global distribution and spread of the Plasmodium vivax-associated virus MaRNAV-1

Petrone, M. E.; Charon, J.; Parry, R. H.; Grigg, M. J.; Piera, K. A.; Westaway, J. A.; Shioda, K.; Russell, B.; Price, R. N.; Williams, T.; Kenangalem, E.; McCarthy, J. S.; Barber, B. E.; Holmes, E. C.; Anstey, N. M.

2026-03-02 evolutionary biology 10.64898/2026.02.26.708358 medRxiv
Top 0.1%
21.9%
Show abstract

Matryoshka RNA virus 1 is a bi-segmented and single-stranded RNA virus associated with Plasmodium vivax, a cause of human malaria. Little has been uncovered about the epidemiology and ecology of this virus since its discovery in 2019. To address this, we used a combination of primary and publicly available metatranscriptomic data to map the geographic distribution and host associations of MaRNAV-1. We detected this virus throughout Southeast Asia, in parts of South America, and, for the first time, in Oceania. Despite its broad distribution, MaRNAV-1 was found exclusively in metatranscriptomes containing P. vivax, suggesting that there is a specific virus-host relationship that has shaped the evolutionary history of this virus. We were unable to estimate the emergence date of the MaRNAV-1 lineage; however, phylogeographic mapping analysis suggested that MaRNAV-1 may have radiated from Southeast Asia. Our findings have both evolutionary and public health implications and can serve as the basis for future investigations in these fields.

5
Insights into goatpox virus and sheeppox virus genomes from pangenome graphs

Downing, T.

2026-03-31 genomics 10.64898/2026.03.28.714820 medRxiv
Top 0.1%
19.2%
Show abstract

The Capripoxviruses (CaPV) comprise three species: goatpox virus (GTPV), sheeppox virus (SPPV) and lumpy skin disease virus (LSDV). They are large double-stranded DNA viruses with highly conserved core genomes and variable terminal regions. Previous studies have described variation in CaPV gene content, their broader population structure and the contribution of non-coding and structural variation remains opaque. This study investigated the genomic diversity and evolutionary history of GTPV and SPPV using an integrative framework combining phylogenetics, pangenome variation graphs (PVGs), and gene-specific analyses. We found marked differences in population structure between the two viruses. GTPV comprised three deeply divergent and genetically stable lineages with limited evidence of recent gene flow, whereas SPPV had weaker clade separation consistent with an ancestral bottleneck followed by recent population expansion. PVG-based analyses indicated that GTPV has a comparatively closed pangenome, while SPPV remains open, particularly at the genome termini. Structural and haplotypic variation was concentrated at the inverted terminal repeats (ITRs), which moderate host immunity and specificity. In several lineages, extended putative ORFs spanning adjacent terminal genes were observed, indicating recurrent structural plasticity at the genome ends. Patterns of gene-specific conservation and divergence highlighted loci under strong constraint and lineage-specific structural changes that may contribute to host specificity. Together, these results demonstrate how graph-based genome models complement gene-based analyses in resolving poxvirus genome evolution and provide a resource for improved comparative and population genomic studies of large DNA viruses. SignificanceCapripoxviruses are economically important livestock pathogens, yet the genomic mechanisms underlying their diversification and host specificity remain poorly resolved. By applying pangenome variation graphs alongside phylogenetic and gene-level analyses, this study reveals fundamental differences in how goatpox and sheeppox viruses have evolved. Goatpox virus had a deeper, more stable lineage structure, whereas sheeppox virus was more recent and diverse. Importantly, structural variation at the inverted terminal repeats emerged as a major driver of genomic diversity, including lineage-specific haplotypes and variable gene structures. These findings demonstrated the value of graph-based genome representations for resolving complex variation in large DNA viruses and provides a framework for improving genomic surveillance, comparative analyses, and future investigations into host range, virulence and tropism.

6
RNA virus discovery in Australian camelids reveals divergent picornaviruses and the convergent evolution of upstream ORFs

Takada, K.; Mifsud, J. C.; Hirano, J.; Harvey, E.; Sadiq, S.; Lang, B. J.; Matsuura, Y.; Holmes, E. C.

2026-02-20 microbiology 10.64898/2026.02.19.706906 medRxiv
Top 0.1%
18.2%
Show abstract

Invasive species can impact viral ecology, evolution and emergence by acquiring and disseminating viruses absent from native hosts. However, the extent to which invasive species harbour previously unrecognized RNA viruses and transmit these to native species is uncertain. We performed metatranscriptomic sequencing of invasive camelids in Australia and identified several previously undescribed vertebrate-associated RNA viruses, including an astrovirus closely related to avian-associated viruses suggesting a recent host jump. We also identified highly divergent picornaviruses that differed sufficiently from recognized taxa in genome organization and polyprotein phylogeny to establish a new genus. Notably, one virus encoded a putative upstream ORF (uORF) in the 5' genomic region. Across the Picornaviridae, putative uORF gain and loss appear to have occurred multiple times independently. In addition, although most of these uORF-encoded proteins exhibited little to no amino acid sequence homology, a subset showed overlapping ranges of secondary structure composition and intrinsic disorder, and when heterologously expressed, these proteins were translated and triggered reproducible transcriptional responses in a cell line. While no single pathway was uniformly affected across all uORFs, distinct uORFs from divergent lineages consistently perturbed overlapping sets of cellular pathways, supporting broadly analogous functional effects despite a lack of sequence homology. These findings demonstrate that uORFs represent a recurrent and selectable functional module within RNA virus genomes, suggest that the upstream genomic position itself constitutes a "hotspot" for the repeated acquisition of a functional module, and provide experimental evidence that their functional properties have converged across evolutionarily independent lineages. Author SummaryRNA viruses evolve under strong genomic constraints, forcing them to repeatedly adopt similar solutions to common challenges posed by their host environment. Invasive species can impact viral ecology, evolution and emergence by acquiring and disseminating non-native viruses. By characterizing RNA viruses infecting invasive camelids in Australia, we discovered previously unrecognized RNA viruses and recurrent patterns of genome evolution. Notably, we identified a small additional gene located at the 5' genomic region, known as an upstream open reading frame (uORF). uORFs were independently acquired and lost across multiple viral lineages, revealing that the same genomic region can be repeatedly exploited to acquire new functions. Although uORF-encoded proteins share little or no amino acid sequence homology, some of the encoded proteins showed comparable secondary structure composition and induce overlapping host cellular responses when expressed in cells. Hence, RNA viruses that have followed different evolutionary paths can converge on similar functional strategies.

7
A newly emergent N1 neuraminidase associated with clade 2.3.4.4b highly pathogenic avian influenza A(H5) viruses in North America

Wersebe, M. J.; Paterson, N. M.; Hassell, N.; Zheng, X.-y.; Rambo-Martin, B. J.; Frederick, J. C.; Lacek, K. K.; Sullivan, A. H.; Kirby, M.; Kondor, R.; Jang, Y.; Schatzman, S.; Di, H.; Davis, C. T.

2026-03-10 infectious diseases 10.64898/2026.03.09.26347929 medRxiv
Top 0.1%
17.3%
Show abstract

We investigated the evolutionary history of the newly emergent neuraminidase (am4N1) associated with the D1.1 and D1.2 genotypes of highly pathogenic avian influenza A(H5N1) viruses in North America. Phylogenetic inference places am4N1 in a sister clade to Eurasian avian, swine, and human A(H1N1)pdm09 viruses and distinct from 1918, pre-2009 human seasonal, and classical swine A(H1N1) lineages. Am4N1 descends from diverse avian N1 genes endemic to the Americas. Phylodynamic analysis indicates a monophyletic am4N1 lineage with numerous introductions of viruses carrying the am4N1 gene likely originating from western Canada into the United States during emergence of the D1.1 and D1.2 genotypes. The lineage has diversified and accumulated deletions in the stalk domain. Despite amino acid divergence, structural modeling shows conserved neuraminidase architecture in the globular head. Given its distinct ancestry and amino acid sequence, further studies are needed to assess cross-reactivity of antibodies from prior human A(H1N1)pdm09 infections.

8
Synonymous substitution rate slowdown preceding the emergence of SARS-CoV-2 variants and during persistent infections

Havens, J. L.; Gangavarapu, K.; Wang, J. C.; Taki, F.; Luoma, E.; Pekar, J. E.; Amin, H.; Di Lonardo, S.; Omoregie, E.; Hughes, S.; Andersen, K. G.; Vasylyeva, T. I.; Suchard, M. A.; Wertheim, J. O.

2026-01-28 epidemiology 10.64898/2026.01.26.26344861 medRxiv
Top 0.1%
14.8%
Show abstract

The emergence of variants has shaped the COVID-19 pandemic. The lack of directly observed precursors to these variants has led to proposals that variants emerge from either persistent infections, transmission in non-human animal populations after reverse-zoonosis, or cryptic transmission in the human population. We investigated the origin of variants by analyzing the molecular clock and rate of nonsynonymous and synonymous substitutions in SARS-CoV-2 circulating in human population, persistently infected individuals, non-human animals, and along variant stems: the branches preceding emergence of SARS-CoV-2 variants (Alpha, Beta, Gamma, Delta, Epsilon, Iota, B.1.637, Mu, and Omicron: BA.1, BA.2/BA.4/BA.5). Along the variant stems we find evidence for an acceleration in the non-synonymous substitution rate, as compared with non-synonymous substitution rate along the branches that represent the genetic diversity of circulating virus. We also find evidence for a slowdown in the synonymous substitution rate preceding the emergence of multiple named variants (e.g., Beta, Delta, Iota, Mu, Omicron BA.1); a similar pattern was observed in some individuals with persistent infections, suggesting that the viral replication rate can slow down during persistent infection. However, the synonymous rate slowdown was not observed for all variants, with some exhibiting an increase in synonymous substitution rates preceding their emergence compared with typical viral transmission (e.g., Alpha, Epsilon). The similarity in evolutionary dynamics preceding some variant emergence and during persistent infections supports the hypothesis that persistent infections were the likely source of many COVID-19 variants.

9
Adaptive Remodeling of the MPXV B21R Receptor-Binding Domain Enhances DC-SIGN Interaction and Identifies Conserved CTL Targets for T-Cell Vaccine Development

Kumar, S.; Harnam, A. S.; Kumar, S.; Paweska, J. T.; Abdel-Moneim, A. S.; Saxena, S. K.

2026-03-02 microbiology 10.64898/2026.03.02.708970 medRxiv
Top 0.1%
14.3%
Show abstract

Global Mpox transmission is imposing public health concern as the number of cases is progressively increasing since its first major outbreak in 1996. Therefore, understanding its global epidemiological transformation and its underlying mechanism is crucial to decipher the immune evasion strategies exhibited by recent MPXV strains. In the present study, we analyzed the trend of global Mpox epidemiology and identified the current multinational outbreak which has initiated in 2017 from Africa. To explore the molecular basis of this transformation, we considered the B21R protein of MPXV as it may have played a role in viral adaptation and immune escape mechanism as one of the important MPXV structural proteins. Our data shows that Mpox has significantly transformed from 1996 to 2025, where MPXV strains from 2022, 2023, and 2024 are closely clustered whereas 2025 is closely related to 2017 MPXV strain. Structural modeling of B21R using AlphaFold uncovers a modular architecture comprising a putative receptor-binding N-terminal region (p-RBD), a central ectodomain, and a membrane-anchored C-terminal segment. Mapping solvent accessibility across the full-length B21R protein revealed that p-RBD exhibited highest solvent exposure compared to other B21R protein domains. As a potential cellular receptor for entry into the host targeted cell, we evaluated the interaction of p-RBD of B21R protein with CRD region of DC-SIGN, which showed the gradual increase in the binding affinity with acquired mutations. Moreover, we found alteration in the O-linked glycosylation sites at p-RBD regions of B21R protein which is crucial for the MPXV entry into the host cell. Importantly, we observed significant changes in linear B cell epitopes of p-RBD, impacting the humoral immunity, while CTL epitopes remained conserved. Hence, we showed the significance of B21R p-RBD as a T-cell based vaccine candidate for prevention of Mpox. This study provides novel insights into the recent global transmission of the Mpox and explored a plausible mechanism of humoral immune escape strategies through progressive mutations in the B21R protein and potential development of T-cell based vaccine candidate. Significance statementOur study represents the global transmission dynamics of Mpox and immune evasion strategy of recent MPXV strains. Epidemiological transformation analysis revealed that the current multinational outbreak of Mpox originated in Africa in 2017, highlighting the expanding global footprint of MPXV. Our analysis based on the B21R protein shows the evolutionary adaptation of the MPXV associated with progressive mutations responsible for increased affinity towards the DC-SIGN receptor and potential reason for increased infectivity. Importantly, alterations in O-linked glycosylation sites and linear B cell epitopes show potential antigenic drift in the recent Mpox outbreaks showing immune escape strategies. These findings provide insights into Mpox epidemiology and the molecular basis of MPXV adaptation, informing vaccine design, therapeutic strategies, and improved countermeasures against future outbreaks. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=158 SRC="FIGDIR/small/708970v1_ufig1.gif" ALT="Figure 1"> View larger version (50K): org.highwire.dtl.DTLVardef@7d949forg.highwire.dtl.DTLVardef@a8918corg.highwire.dtl.DTLVardef@eba49forg.highwire.dtl.DTLVardef@84c331_HPS_FORMAT_FIGEXP M_FIG C_FIG

10
Landscape viromics of introduced honeybees and bumblebees reveal distinct environmental and host-specific effects

Haque, S.; Remnant, E. J.; Damayo, J. E.; Ponton, F.; Dudaniec, R. Y.

2026-03-30 evolutionary biology 10.64898/2026.03.26.714648 medRxiv
Top 0.1%
14.1%
Show abstract

Understanding how viral communities vary across co-occurring hosts and environments is essential for assessing species-specific viral risks under changing land use and climate. This is particularly relevant for managing introduced bees, which face persistent viral threats themselves, as well as transmitting plant viruses. Here, we compare RNA viromes of the long-established honeybee (Apis mellifera, introduced to Tasmania in 1831) and the more recent invader, the bumblebee (Bombus terrestris, invasive since 1992), across 14 Tasmanian sites - an island still free of the viral vector, Varroa destructor. Using a metatranscriptomic approach on total RNA from whole bees, we identified insect- and plant-associated viruses and inferred phylogenetic patterns of insect viral sharing, divergence, and potential cross-species transmission. We also assessed spatial and environmental drivers of viral composition, diversity, and richness. Geographic longitude, precipitation, temperature, and pasture percentage influenced the total, insect-, and plant-associated viromes of B. terrestris. In contrast, for A. mellifera, only precipitation and temperature were associated with insect and plant viral alpha diversity and community composition. Phylogenetic analyses revealed that Black Queen Cell virus in A. mellifera from Tasmania has diverged from mainland Australian sequences, and two distinct sub-strains of Lake Sinai virus 1 were shared by both bee species. Lake Sinai virus 3 showed evidence of interspecies transmission between A. mellifera and B. terrestris. Notably, this study provides the first detection of Moku virus in Australian bees and globally in bumblebees, suggesting potential interspecies transmission among social Hymenoptera. Overall, our findings demonstrate local viral diversification and reveal that B. terrestris viromes are more strongly shaped by environmental factors than those of A. mellifera, underscoring the importance of monitoring invasive pollinators as reservoirs and vectors of viral emergence.

11
Estimation of Annual Exposures and Antibody Kinetics Against Norovirus GII.4 Variants from English Serology Data, 2007-2012.

O'Reilly, K.; Hay, J. A.; Lindesmith, L.; Allen, D.; Hue, S.; Debbink, K.; Kucharski, A.; Baric, R.; Breuer, J.; Edmunds, W. J.

2026-03-11 epidemiology 10.64898/2026.03.09.26347737 medRxiv
Top 0.1%
12.4%
Show abstract

Norovirus in humans is highly contagious, causing diarrhoea and vomiting, and is especially common in young children. Winter incidence varies annually, and previous research indicates that the change of dominant norovirus variant is followed by high incidence, but having a clear mechanism to explain this observation could support better prediction of epidemics. Here we analyse unique norovirus serology blockade data from 656 children in England collected via opportunistic sampling between 2007-2012 using a mathematical model of multi-variant antibody kinetics to infer metrics such as annual attack rates and age-specific infection rates. Analysis reveals that overall infection rates were 204 infections per 1000 person-years (posterior median; 95% credible intervals: 188-221). Infection rates were lowest in children aged under 1 year at 164 infections per 1000 person-years (95% CrI: 121-209) and highest in children aged 5 years and older, at 252 infections per 1000 person-years (95% CrI: 212-288). The annual attack rate was highest in 2002, coincident with transition of the dominant variant to Farmington Hills, and high attack rates are frequently observed with emergence of new variants, but not always. Parameter estimates indicate moderate evidence for the immune imprinting hypothesis: a stronger antibody response to variants encountered earliest in life. Estimates of infection rates estimated here from serology are higher than incidence reported within similar settings based on disease only and is consistent with considerable asymptomatic infection. The combined use of multi-variant antibody data and a mathematical model provide key insights on the natural history of norovirus variants which can inform epidemic planning.

12
Predicting the antigenic evolution of seasonal influenza viruses using phylogenetic convergence

Turner, S. A.; Pattinson, D. J.; Fouchier, R. A. M.; Smith, D. J.

2026-04-10 evolutionary biology 10.64898/2026.04.10.717627 medRxiv
Top 0.1%
10.6%
Show abstract

The antigenic evolution of human seasonal influenza viruses is primarily driven by single amino acid substitutions immediately adjacent to the receptor binding site in the hemagglutinin (HA) protein. The ability to predict these substitutions would allow vaccine strains to be selected with an understanding of likely future antigenic variation. Here, we estimate the effect of HA substitutions on viral fitness using measurements of convergent evolution in a large phylogeny. We show that the substitutions which have historically caused major antigenic changes in H3N2 influenza viruses were nearly always one of few substitutions near the HA receptor binding site estimated to be under positive selection in sequences collected before the antigenic transition, based on convergent acquisition of the substitution in multiple independent lineages. Furthermore, this signal predates the establishment of the major clade containing the antigenic substitution by more than one year, so is highly informative for prospective prediction.

13
Evolutionary analysis of V protein pseudogenization in an RNA editing-deficient paramyxovirus

Rakib, T. M.; Akter, L.; Matsumoto, Y.

2026-04-08 evolutionary biology 10.64898/2026.04.06.716634 medRxiv
Top 0.1%
10.3%
Show abstract

In most paramyxoviruses, RNA editing in the P gene enables expression of the V protein. Human parainfluenza virus type 1 (HPIV-1) differs from most paramyxoviruses in that it lacks RNA editing and does not produce a functional V protein, although its genome retains sequences corresponding to the ancestral V reading frame. Here, we analyzed all HPIV-1 genome sequences available in the NCBI GenBank database to assess the evolutionary state of this V protein-specific region. Using Sendai virus (SeV) as a closely related reference with an identical P gene length, we defined a pseudo-V reading frame by virtually inserting a single nucleotide at the conserved RNA editing site. In this pseudo-V frame, HPIV-1 showed a marked excess of stop codons within the 253-amino-acid region corresponding to the post-editing sequence, far exceeding expectations under random codon usage. This pattern was not observed in other viral genes analyzed under the same definition, nor in SeV, nor was it reproduced by in silico evolutionary simulations under constraints preserving the primary open reading frame. These results are consistent with a virus-specific evolutionary trajectory following the loss of RNA editing, rather than with generic coding constraints acting on overlapping reading frames.

14
A novel genetically distinct Amdoparvovirus in Sorex araneus in the United Kingdom highlights an unexplored ancestral link

Briggs, T. C.; Maskell, D.; Henderson, D.; Graham, C.; Mansfield, B.; Jorge, D.; Schlacter, A.-L.; Bernard, M.; Callaway, R.; Osmond, D.; Amaya-Cuesta, J.; Pfaff, F.; Ashpitel, H.; Smith, G. C.; Gupta, Y. K.; McElhinney, L. M.; Schilling, M.

2026-02-06 evolutionary biology 10.64898/2026.02.04.703478 medRxiv
Top 0.1%
10.2%
Show abstract

Amdoparvoviruses have historically been documented almost exclusively in carnivores, with recent detections in bats. However, endogenous viral elements in rodent genomes suggest a more ancient and taxonomically broader evolutionary history. Despite this, small mammals have never been systematically surveyed for extant amdoparvovirus infections. In this study, we used whole genome sequencing to screen four different shrew species and wild American mink in the UK, which may act as a reservoir host for amdoparvoviruses. We identified a highly divergent amdoparvovirus in native common shrews (Sorex araneus) from northern England, tentatively named Shrew parvovirus 1(SP 1). Classical amdoparvovirus sequences were also detected in wild American mink (Neogale vison), confirming the presence of known amdoparvovirus strains in UK mustelids. Phylogenetic analysis revealed that the shrew virus, SP 1, forms a distinct clade, suggesting ancient divergence or long-term cryptic circulation in small mammal reservoirs. These findings fundamentally challenge the hypothesis that amdoparvoviruses are carnivore-restricted pathogens and underscore the importance of systematic wildlife surveillance for understanding viral host range evolution and assessing spillover risks.

15
Tracking and predicting the dynamics of HIV-1 epidemics in France using virus genomic data

Colliot, L.; Garrot, V.; Petit, P.; Zhukova, A.; Chaix, M.-L.; Mayer, L.; Alizon, S.

2026-04-24 epidemiology 10.64898/2026.04.21.26351380 medRxiv
Top 0.1%
10.1%
Show abstract

Understanding the dynamics of HIV epidemics is important to control them effectively. Classical methods that mainly rely on occurrence data are limited by the fact that an unknown part of the epidemic eludes sampling. Since the early 2000s, phylodynamic methods have enabled the estimation of key epidemiological parameters from virus genetic sequence data. These methods have the advantage of being less sensitive to partial sampling and to provide insights about epidemic history that even predates the first samples. In this study, we analysed 2,205 HIV sequences from the French ANRS PRIMO C06 cohort. We identified and were able to reconstruct the temporal dynamics of two large clades that represent the HIV-1 epidemics in the country. Using Bayesian phylodynamic inference models, we found that the first clade, from subtype B, originated in the end of 1970s, grew rapidly during the 80s before decreasing from 2000 to 2015 and stagnating since then. The second clade, from circulating recombinant form CRF02_AG, emerged and spread in the 80s, grew again in the early 2000s, before declining slightly. We also estimated key epidemiological parameters associated with each clade. Finally, using numerical simulations, we investigated prospective scenarios and assessed the possibility to meet the 2030 UNAIDS targets. This is one of the rare studies to analyse the HIV epidemic in France using molecular epidemiology methods. It highlights the value of routine HIV sequence data for studying past epidemic trends or designing public health policies.

16
SARS-CoV-2 Introductions into Lao PDR Revealed by Genomic Surveillance, 2021-2024

Panapruksachat, S.; Troupin, C.; Souksavanh, M.; Keeratipusana, C.; Vongsouvath, M.; Vongphachanh, S.; Vongsouvath, M.; Phommasone, K.; Somlor, S.; Robinson, M. T.; Chookajorn, T.; Kochakarn, T.; Day, N. P.; Mayxay, M.; Letizia, A. G.; Dubot-Peres, A.; Ashley, E. A.; Buchy, P.; Xangsayarath, P.; Batty, E. M.

2026-04-13 epidemiology 10.64898/2026.04.09.26349480 medRxiv
Top 0.1%
10.1%
Show abstract

We used 2492 whole genome sequences from Laos to investigate the molecular epidemiology of SARS-CoV-2 from 2021 through 2024, covering the major waves of COVID-19 disease in Laos including time periods of travel restrictions and after relaxation of travel across international borders. We identify successive waves of COVID-19 caused by shifts in the dominant lineage, beginning with the Alpha variant in April 2021 and continuing through the Delta and Omicron variants. We quantify a shift from a small number of viral introductions responsible for widespread transmission in early waves to a larger number of introductions for each variant after travel restrictions were lifted, and identify potential routes of introduction into the country. Our study underscores the importance of genomic surveillance to public health responses to characterize viral transmission dynamics during pandemics.

17
Population-scale discovery and analysis of non-reference endogenous retrovirus insertions in wild house mice

Yano, T.; Takada, T.; Fujiwara, K.; Watabe, D.; Hirose, S.; Masuya, H.; Endo, T.; Osada, N.

2026-02-20 evolutionary biology 10.1101/2025.09.23.678169 medRxiv
Top 0.1%
10.0%
Show abstract

Endogenous retroviruses (ERVs) represent a major source of structural variation in mammalian genomes, yet their diversity in wild populations remains poorly understood. Here, we conduct a comprehensive genome-wide survey of non-reference ERV insertions in wild house mice (Mus musculus) to characterize their distribution and evolutionary dynamics. Using a newly developed bioinformatics pipeline, we detected and annotated over 100,000 non-reference ERV insertions from short-read sequencing data across 163 wild mouse genomes. Our analyses revealed marked differences in ERV insertion patterns among subspecies and populations, including variation in genomic localization and population-specific polymorphisms. These heterogeneous patterns suggest distinct evolutionary histories and host-retrovirus interactions across populations. For instance, we describe the distribution of the ERV-derived Fv4 locus, which shows subspecies-restricted occurrence and confers resistance to murine leukemia viruses (MLVs). Several lines of evidence showed that the spread of Fv4 insertions in Korean population has been driven by adaptive introgression from neighboring populations. Our study provides the first large-scale population genomic scan of ERV diversity in wild house mice. By cataloguing extensive polymorphism in non-reference ERV insertions, our results highlight the role of ERVs as dynamic genomic elements that contribute to structural variation and adaptive evolution. Article SummaryEndogenous retroviruses (ERVs) are viral sequences embedded in animal genomes that can create structural genetic variation. In this study, we conducted a genome-wide survey of non-reference ERV insertions in 163 wild house mice using short-read sequencing data and a newly developed computational pipeline. We identified more than 100,000 polymorphic ERV insertions and found substantial differences among subspecies and geographic populations. One example, the ERV-derived Fv4 locus, illustrates how ERV variation can influence the genetic pattern of polymorphisms in the species. These results demonstrate that ERVs are dynamic genomic elements that contribute to population divergence and adaptive evolution.

18
The emergence and molecular evolution of H5N1 influenza viruses in United States dairy cattle

Pekar, J. E.; Gangavarapu, K.; Crespo-Bellido, A.; Peacock, T. P.; Wertheim, J. O.; Dudas, G.; Joy, J. B.; Chand, M.; Debarre, F.; Gangavarapu, P.; Goldhill, D. H.; Groves, N.; Ji, X.; Malpica Serrano, L.; Moncla, L.; Rasmussen, A. L.; Ruis, C.; Venkatesh, D.; Kraemer, M. U. G.; Pybus, O. G.; Andersen, K. G.; Suchard, M. A.; Nelson, M. I.; Lemey, P.; Worobey, M.; Rambaut, A.

2026-04-01 evolutionary biology 10.64898/2026.03.30.713641 medRxiv
Top 0.1%
9.9%
Show abstract

Prior to 2024, highly pathogenic avian influenza H5N1 clade 2.3.4.4b viruses circulated predominantly in wild birds and poultry. In 2024 and 2025, 2.3.4.4b genotypes B3.13 and D1.1 were detected in United States dairy cattle. Using whole-genome and segment-specific phylodynamic inference, we estimate that B3.13 and D1.1 spilled over from wild birds into dairy cattle in late 2023 and late 2024, respectively. Spillover occurred shortly after the formation of the reassortant genotypes and was followed by months of cryptic transmission prior to detection. We found that both B3.13 and D1.1 evolved at higher rates in cattle relative to birds, primarily due to relaxed purifying selection. Site-specific analyses identified genomic sites under positive selection in cattle relative to birds, indicating adaptation and likely contributing to improved viral fitness after spillover. Intensified genomic surveillance in dairy cattle is essential as population immunity introduces additional selection pressures, with ever-changing risk for human emergence.

19
PREMISE: A Quality-Aware Probabilistic Framework for Pathogen Resolution and Source Assignment in Viral mNGS

Vijendran, S.; Dorman, K.; Anderson, T. K.; Eulenstein, O.

2026-03-18 bioinformatics 10.64898/2026.03.15.711921 medRxiv
Top 0.1%
8.9%
Show abstract

The circulation of Influenza A viruses (IAVs) in wildlife and livestock presents a significant public health threat due to their zoonotic potential and rapid genomic diversification. Accurate classification of viral subtypes and characterization of within-host diversity are crucial for risk assessment and vaccine development. Although metagenomic sequencing facilitates early detection, prevalent memory-efficient k-mer-based pipelines often discard critical linkage information. This loss of information can result in missed or imprecise pathogen identification, potentially delaying clinical and public health responses. We introduce PREMISE (Pathogen Resolution via Expectation Maximization In Sequencing Experiments), a probabilistic, alignment-based framework implemented in RUST for high-resolution viral genome identification. By integrating advanced string data structures for efficient alignment with a quality-score-aware Expectation-Maximization algorithm, PREMISE accurately identifies source strains, estimates relative abundances, and performs precise read assignments. This framework provides superior source estimation with statistical confidence, enabling the identification of mixed infections, recombination, and IAV-reassortment directly from raw data. Validated against simulated and empirical datasets, PREMISE outperforms state-of-the-art k-mer methods. Ultimately, this framework represents a significant advancement in viral identification, providing a foundation for novel approaches that can automatically flag reassorted viruses or recombination events in the future, thereby improving the detection of emerging pathogens with zoonotic potential. Availabilityhttps://github.com/sriram98v/premise under a MIT license. Contactsriramv@iastate.edu

20
Mutations and predicted glycosylation patterns in respiratory syncytial virus isolates correlate with disease severity.

Hunte, M. L.; Herbst, K. W.; Michelow, I. C.; Szczepanek, S. M.

2026-02-04 microbiology 10.64898/2026.02.03.703626 medRxiv
Top 0.1%
8.5%
Show abstract

Respiratory syncytial virus (RSV) remains an important cause of lower respiratory tract infections in young children, producing mild to life-threatening disease. Although rapid viral evolution through genetic drift is well established, the structural and functional impacts of specific pathoadaptive mutations linked to enhanced virulence are poorly defined. We investigated these relationships by isolating RSV from available nasal swabs of five hospitalized infants during the 2022-2023 winter season and conducting comparative viral genomic analysis. Severity of disease was evaluated using a validated clinical scoring system. Whole-genome sequencing followed by reference-guided assembly and structural modeling revealed distinct amino acid polymorphisms correlating with disease severity. Phylogenetic analysis placed all isolates within the RSV-A GA2.3.5 G clade. Isolates from mild moderate and severe cases clustered in A.D.1.5 and A.D.1.8 subclades. Nineteen amino acid differences were associated with clinical severity and isolates from moderate or severe cases replicated more rapidly in vitro than mild isolates. Computational glycosylation predictions indicated an increasing number of glycosylation sites in the G protein corresponding with greater disease severity. Together, these data suggest that specific pathoadaptive mutations may contribute to enhanced viral replication and severity, and are relevant for future surveillance efforts and the development of immune-based strategies targeting virulence-associated residues.